---
title: >-
  LobeChat Supports Multimodal Interaction: Visual Recognition Enhances Intelligent Dialogue

description: >-
  LobeChat supports various large language models with visual recognition capabilities, allowing users to upload or drag and drop images. The assistant will recognize the content and engage in intelligent dialogue, creating a more intelligent and diverse chat environment.

tags:
  - Visual Recognition
  - LobeChat
  - GPT-4 Vision
  - Google Gemini Pro
  - Multimodal Interaction
---

# Supported Models for Visual Recognition

LobeChat now supports several large language models with visual recognition capabilities, including OpenAI's [`gpt-4-vision`](https://platform.openai.com/docs/guides/vision), Google Gemini Pro vision, and Zhiyuan GLM-4 Vision. This empowers LobeChat with multimodal interaction capabilities. Users can effortlessly upload images or drag and drop them into the chat window, where the assistant can recognize the image content and engage in intelligent dialogue, building a smarter and more diverse chat experience.

This feature opens up new avenues for interaction, allowing communication that extends beyond text to include rich visual elements. Whether sharing images during everyday use or interpreting graphics in specific industries, the assistant delivers an exceptional conversational experience. Additionally, we have carefully selected a range of high-quality voice options (OpenAI Audio, Microsoft Edge Speech) to cater to users from different regions and cultural backgrounds. Users can choose a suitable voice based on personal preferences or specific contexts, thus receiving a more personalized communication experience.
